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Abstract 

Current techniques for generating a knowledge space, such as QUERY, guaran- 
tees that the resulting structure is closed under union, but not that it satisfies 
wellgradedness, which is one of the defining conditions for a learning space. We 
give necessary and sufficient conditions on the base of a union-closed set family 
that ensures that the family is well-graded. We consider two cases, depending on 
whether or not the family contains the empty set. We also provide algorithms for 
efficiently testing these conditions, and for augmenting a set family in a minimal 
way to one that satisfies these conditions. 

Introduction 

A family of sets T is well-graded if any two sets in T can be connected by a sequence 
of sets formed by single-element insertions and deletions, without redundant opera- 
tions, such that all intermediate sets in the sequence belong to T. The family T is 
called U-closed if it is closed under union. (Formal definitions are given in our next 
section.) Well-graded families are of interest for theorists in several different areas 
of combinatorics, as vari ous fam i lies of sets or relations are well-graded. For exam- 
ple, Theorems 2 and 4 in iBogartl ( 1973 ) imply that the family of all partial orders on 
a finite set is well-graded. The same property of well-gradedness is shared by other 
famil ies, such as the semiord e rs, th e interval orders, and the biorders, again on finite 
sets ( jDoignon and Falmagnei . 119971 ). Via representation theorems, this concept also 
applies to the partial cubes, to wit, graphs isometrically embeddable into hypercube s 
flGraham and Pollakl . Il97ll : iDjokovidfll^ : IWinklerl . 1 1984! : Ilmrich and Kkvzail lipOflh . 
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and to t he oriente d med i a which are semigrou ps of transformations satisfying certain 
axioms (jFalmagnd . 119971 ; lEppstein et all 120071 ) . 

When the family T is well-graded, U-closed and contains the e mpty set, one ob- 
tains an object variously called an antimatro id ( IKorte et all Il99ll ). a learning space 



(ICosvn and Uzunl . 120081 ; iFalmagne et all 120061). or a well -graded knowledge space (IDoignon and Falmagn< 



19851 ). The monograph of IDoignon and Falmagnd (Il999l ) contains a comprehensive ac- 
count of this topic. Learning spaces are applied in mathematical modeling of education. 
In such cases, the ground set is the collection of problems, for example in elementary 
arithmetic, that a student must learn to solve in order to master the subject. The fam- 
ily T contains then all the subsets forming the feasible knowledge states. In practice, 
the size of such a family is quite large, typically containing millions of states^, which 
raises the problem of summarizing T efficiently An obvious choice for this purpose is 
the base of that family, namely the unique minimal subset of T whose completion via 
all possible unions gives back T. 

For various reasons, when building a learning space in practice, one may fall short 
of some sets to achieve well-gradedness, a pr operty regarded as essen tial for promoting 
efficient learning (see the axiomatization of ICosvn and Uzunl . 120081 ). This raises the 
problems of uncovering possibly missing sets, and completing the family economically 
and/or optimally. These considerations inspired the work presented here. 

We solve the following problems for a finite family Q of finite sets. 

1. Find necessary and sufficient conditions for Q to be the base of a well-graded 
U-closed family of sets. 

2. Find such conditions when the well-graded U-closed family of sets is known to be 
a learning space, that is, the family contains the empty set. (These conditions 
may be simpler than in Case 1.) 

3. Provide efficient algorithms for testing these conditions on a family Q and un- 
covering possibly missing sets. (Different algorithms may be used in Problems 1 
and 2.) 

4. Supposing that some family Q fails to satisfy the conditions in Problems 1 or 2, 
provide algorithms for modifying Q in some optimal sense to yield a family Q* 
satisfying such conditions. 



Except for the passing remark involving Counterexample 
considered in this paper. 



only finite sets are 



Background and Preparatory Results 

1 Definition. Let T be a family of subsets of a set X . A tight path between two 
distinct sets P and Q (or from P to Q) in T is a sequence P = P, P%, . . . , P n = Q in 
T such that d(P, Q) = \P A Q\ = n and d{P u P i+l ) = 1 for < i < n - 1. 

^or a ground set that may contain a couple of hundreds of problem types. 
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The family T is wel l-graded or a wg-family i f ther e is a tight path between any two 
of its distinct sets. (cf. Doignon and Falmagne . 1997 ; Falmagne and Doignonl . [l997l^l. 



2 Definition. A family of sets T is closed under union, or U-closed if for any nonemptj{§ 
Q C T we have U£? G JF. A well-graded family closed under union and containing the 
empty set is a learning space. 

3 Definition. The span of a family of sets Q is the family containing any set which 
is the union of some subfamiljQ of Q. In such a case, we write &(G) = G^ and we say 
that Q spans . By definition is thus U-closed. A base of a U-closed family T is 
a minimal subfamily B of spanning (where 'minimal' is meant with respect to set 
inclusion: if S(H) = T for some Ti. C £>, then 7i = B). Notice that if G JF, we must 
have G B, with U{0} = 0. In such a case, we use the abbreviation B = B \ {0}. 
Note that a family Q spanning a family T is a base of T if and only if none of the sets 
in Q is the union of some other sets in Q. 

Any finite U-closed family has a base, which is unique. This uniqueness property 
of the base also holds in the infinite case but some infinite families have no base: take, 
for example, the collection of all o pen sets of K or Counterexample [T2J (In this regard, 



sec 



Doignon and Falmagnd . 11999k Theorems 1.20 and 1.22.) 
The following lemma is a key tool, as it allows us to infer the wellgradedness of a 
family from that of its base. 

4 Lemma. The span of a wg-family is well-graded. 

Proof. Let S>(G) be the span of some wg-family Q. Take any two distinct X, Y in 
§(<?). Since $(G) is U-closed by definition, X U Y is in S(G) and we have d(X,Y) = 
d(X, X U Y) + d(X U Y, Y). Accordingly, it suffices to prove that there is in §(£/) a tight 
path 

(1) X 1 = X,X 2 ,...,X n = XUY, 

with in fact Xj C X i+ i, 1 < i < n — 1. By definition of the span, there exists finite 
W, /C C Q such that X = UTi. and Y = U/C. Without loss of generality (exchanging the 
roles of X and Y if needed), we can assume that there exists some K G /C such that 
K \ X ytz 0. Choose H G Ti arbitrarily. By the wellgradedness of G, there is a tight 
path Hi = H, . . . , if m = K. Let be the first index such that \ I^0. (Such an 
index must exist because K \ X ^ 0.) We necessarily have \Hf. \X\ = 1. Defining 
X 2 = (UTi) U H k , we obtain X 1= X cX 2 ^XUY with \X 2 \X 1 \ = 1. An induction 
completes the proof. □ 

Note however that the base of a U-closed wg-family need not be well-graded. 



2 This con c ept w as introduced earlier under a different name; see lKuzmin and Ovchinnikov ( 1975t ). 
Ovchinnikrvl (|l980h . 



3 For some authors, the subfamily Q may be empty, with U0 = 0. So, a U-closed family automat 
ically contains the empty set. We do no t use this convention her e 



Contrary to the convention used bvlDoi gnon and Falmaen the empty subfamily of Q is 

not allowed; so G §(5) only if e Q. 
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5 Example. The U-closed wg-family 



(2) 



T = {0, {a}, {b}, {c}, {a, &}, {a, c}, {b, c}, {c, d}, {a, b, c}, 
{a, c, d}, {b, c, d}, {a, b, c, d}, {a, b, c, d, e 



}}• 



has the base {0, {a}, {&}, {c}, {c, d}, {a, b, c, d, e}}, which is not well-graded. Moreover, 
JF has two different minimal well-graded subfamilies spanning T: 



6 Example. Notice that the base of a family which is both U-closed and fl-closed 
(that is, closed under intersection) is not necessarily well-graded. Indeed, consider the 
family 

Q = {0, { a }, {&}, {d}, {a, 6}, {a, d}, {b, d}, {a, b, c}, {a, 6, d}, 

{a, 6, c, d}, {a, 6, c, d, e}}, 

for which {0, {a}, {6}, {d}, {a, 6, c}, {a, 6, c, d, e}} is the base. 

Main Results 

7 Theorem. Let JF be a U-ciosed family with base B. Then T is a wg-family if and 
only if, for any two distinct sets K and L in B, there is a tight path in T from K to 
LU K. IfB contains the empty set, then T is well-graded if and only if there is a tight 
path from to K for any K in B. 

Thus, this result provides a solution to Problems 1 and 2. Another solution to 
Problem 2 is given by Lemma [T9l 

Proof. As T is U-closed with base B, the necessity is clear for both statements. 
To establish that the sufficiency in the first statement also holds, we point out that the 
family B* defined by 



includes B since K = U{K} and K C U{fC} C K U L for any K and L in B. Since 
B C B* C T the family B* spans T. We claim that B* is well-graded, which implies 
by Lemma H] that T is well-graded. The main line of our argument is similar to that 
used in the proof of Lemma HI 

Take any two distinct V,W G B*. By definition of £>*, we have V = UV and 
W = UW for some subfamilies V and W of B. Suppose that d(V, V U W) = n. We 
have to show that there exists in B* a tight path 



(4) 



(3) 



{0, {a}, {b}, {c}, {a, b}, {a, c}, {c, d}, {a, b, c}, 

{a, c, d}, {a, 6, c, d}, {a, 6, c, d, e}}, 

{0, {a}, {6}, {c}, {a, 6}, {6, c}, {c, d}, {a, 6, c}, 

{6, c, d}, {a, 6, c, d}, {a, 6, c, d, e}}. 




M = UA for some A C £> such that 
i^CUiCKuLfor some K,L <E B 



v,v x ,...,v n 



vuw 
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from V to V U W. Without loss of generality (exchanging the roles of V and W if 
needed), we can assume that there is some H G W such that H \ V ^ 0. Choose 
GeV arbitrarily. Then GcffUGCFuW, with H and G in 23. By hypothesis, 
there is a tight path Go = G, G%, . . . , G m = G U if from G to G U if in JF, with 
G G Gi G G U H and d(G, Gj) = i for 1 < i < m. Let be the first index such 
that Gfc \ V 7^ 0. (Such an index must exist because if \ V 7^ 0.) We necessarily 
have \G k \ V\ = 1. Defining = (UV) U G fc , we obtain K = Fcy 1 CFuW / with 
I V^l \ Vo I = 1. An induction completes the proof of the sufficiency for the first statement. 

We now show that if 6 23, then there is a tight path from L to if U L for 
any if and L in 23. Thus, the sufficiency of the second statement follows from that 
in the first statement. Indeed, let K = 0, Ki, . . . , K n = if be a tight path. It is 
easily seen that, after removal of identical terms if need be, the sequence ifo U L = L, 
K x U L, . . . , K n U L = K U L is a tight path from L to K U L. □ 

8 Remark. The set 23* constructed in the proof of Theorem [7] is not necessarily a 
minimal wg-family spanning T . Indeed, the definition of 23* by ((Sj) includes all the 
unions U^4, while only some of them may be needed. An example was provided by 
the wg-family of Example In this case, each of ([3]) and (jlj) is a minimal wg-family 
including the base and spanning the wg-family T defined by ([2]). The set 23* in this 
case would be the union of the two families in ([3]) and (jlj), which is in fact equal to T . 



In the case of learning spaces, iKoppenl (1l998l ) obtained a different, but equivalent 



answer to Problem 1 (see Theorem [T3j) . As shown by Counterexample HH Koppen's 
result does not generalize to the case in which the family does not contain the empty 
set. We review this res u lt bel ow. To this end, we recall some concepts and results of 



Doignon and Falmagnd (119991 ). which we adapt to the general case in which the empty 



set is not assumed to belong to the familjo- Even though the proofs of Theorems 



[TU1 and [TT1 are essentially those of Theorems 1.25 and 1.26 in iDoignon and Falmagne 



(119991 ). we include those proofs for completeness because our context is more general. 



9 Definition. For any x in X = UJF, where T is a U-closed family, an atom at x is a 
minimal set of T containing x (where 'minimal' is with respect to set inclusion). A set 
X in T is called an atorrl§ if either X = G J 7 , or there is some x G X such that X is 
an atom at x. Writing ^(J 7 ) for the power set of J 7 , we denote by &(x) the collection 
of all the atoms at x and refer to 8 : X — > as the surmise function of T . Clearly, 

since X is finite, we have 8(x) 7^ for every x & X; thus, there is at least one atom at 
every point of X. (But see Counterexample! 



10 Theorem. A nonempty set X in a U-closed family T is an atom if and only if 
X £TC for any subfamily 7i of T satisfying UH = X . 



5 All of Doignon and Falmagnd ( 1999l )'s results were developed in the context of knowledge spaces, 



that is U-closed families containing the empty set. We drop the latter condition here. 

6 Our meaning of the te rm 'atom' is different from its usa ge in lattice theory; c f. BirkhofJ (| 19671 ) . 



Davey and Priestley ( 1990h . It also slightly differs from that in Doignon and Falmagne (1999) because 



we do not allow the empty union of a family (see Footnote 4, 5 and 6). 
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Proof. (Necessity.) Suppose that X is an atom at some x G UJF, with X = UH 
for some subfamily Tt of T. Then x G Y for some Y G T~i, with necessarily K C X . 
This implies K = X because X is a minimal set containing x, and so X G TC. The case 
of the atom is straightforward. 

(Sufficiency) If some X G T is not an atom, then for each x G X, we must have 
x G F(.x) C X for some Y(x) G J". Writing H = {Y(x) \ x G X}, we get UH = X, 
with X <£H. □ 

11 Theorem. Tie base of a U-closed family JF is the collection of all its atoms. 

Proof. Let A be the collection of all the atoms of T . We claim that A must be 
the base of T . If G J 7 , we have G A by definition with U{0} = 0. Notice that, 
for any X ^ in J 7 , the set = [Y E A\3x <E X, x <E Y <Z X} exists because 
there is an atom at every point of X C X. We have thus UAx = X and so A spans 
JF (whether or not G JF). Let now 7i be another subfamily of T spanning T . Take 
any Z G A. Since 7i spans T ', there must be a subfamily of 7i such that UQ = Z. 
By Theorem [TU1 we must have Z G C 7i; this yields ^4 C Ti. Thus, A is a minimal 
family spanning T and so is the (unique) base of T . □ 

Note in passing that, in the infinite case, there may not be an atom at every point 
of the ground set X = UJF of a U-closed family T . We already gave the example of the 
collection of all the open sets of R. Below is another, simple example. 

12 A Counterexample. Consider the infinite family T = Q + 7i, with 

(6) G = {G n \G n = {...,-^—,-},n>l} 

n + 1 n 

(7) n = {H n \H n = G n + {i}, G n g g}. 

The family T is U-closed and well-graded and there is no atom at 1 . It is easily verified 
that this U-closed family T has no base. The U-closed family 7i is its own base and 
has no atom at 1 either. 



We turn to lKoppenl (119981 ) ? s result, which is formulated as the last statement in the 



theorem below. We recall that B = B \ {0} for the base B of a learning space. 

13 Theorem. Suppose that T is a learning space with base B and surmise function fi. 
Then {Q(x) \ x G UJF} is a partition of B if and only if there is a tight path from to 
K for any K G B. Accordingly, T is well-graded if and only if {6(x) | x G UJF} is a 
partition of B. 

Proof. Observation. By Theorem [TTl for any K G B, there is some y G K such 
that K is an atom at y. Moreover, the hypothesis that {Q{x) \ x G UJF} is a partition 
of B and \K\ > 1 implies that there exists, for any y' G K distinct from y, at least one 
atom at y' strictly included in K. (Otherwise, we would have K G fi(y) H fi(y')-) 

Assume that {fi(a;) | x G UJF} is a partition of B. Take any K in B and suppose 
that \K\ = n. By the Observation, K n = K is an atom at x n for some x n G K. We 
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use induction on n. If n = 1, then 0, {x{\ = K is the tight path. Suppose that we 
have a tight path if — 0, K\ — {xi}, . . . , Kj = {xi, . . . ,Xj} from to Kj C if n = if. 
From the Observation, we know that there exists an atom Li at yi for any yg G K\Kj, 
with I < £ < n — j and c if. If |ifj U > j + 1 for some index £, then, again 
by the Observation, there is some index i ^ I, I < i < n — j, such that Li C is 
an atom at y t G Z^ \ if,,-, with j + 1 < \Kj U L»| < \Kj U Lg|. By elimination, we 
have necessarily some y^ Kj, 1 < k < n — j and an atom C if at such that 
ifj U = if,- U Lfc. Defining Xj + i = and if,+i = Kj U we obtain the 

tight path Kq = 0, Ki, . . . , if,+i from to Kj+% C if. Applying induction yields the 
necessity in the first statement. 

Conversely, assume that there is a tight path from to L for any L G £>. Suppose 
that 7^ if G B(x)flB(?/) for some if G i3 and some distinct x,y E UJF. A contradiction 
ensues because no tight path 

if = 0, ifi = {%},..., if n = {xi, . . . ,x n } = if 

from to if can exist. Indeed, we must have x, y G if \ if„_i since if is an atom at 
both x and y; and yet |if| = n > n — 1 = |lf n _i|. 

The last statement of the theorem follows from the last statement in Theorem [71 □ 

As announced, this result does not generalize to the case in which the family T 
does not contain the empty set, even if we assume that the family is discriminative 
that is, satisfies the condition: for all x, y G UJF 

(VX EJ 7 , xeX^yeX) x = y. 

14 A Counterexample. Consider the family /C defined by the base 

■A = {{x,y,c}, {y,d}, {c,d}}. 

We get the surmise function 

fi(x) = Vi c }}, fi(y) = {{x, y, c}, {y, d}}, 
fi(c) = {{x, c}, {c, ci}}, fi(d) = {{y, d}, {c, rf}}. 

It is easily checked that /C is discriminative and well-graded; yet, the surmise func- 
tion does not define a partition of the base A. 

Algorithms 

In the algorithms described in this section, we are given as input a family of sets B, 
which is purported to be the base of a U-closed family T . We wish to test whether this 
is true, and if so to determine other properties of T such as whether it is well-graded 
or a learning space. In many cases the definitions given in earlier sections of this paper 
may already be directly translated into algorithms, but a definition may be translated 
into an algorithm in multiple ways, some more efficient than others; the content of 
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the results lies less in the pure existence of the algorithms and more in designing the 
algorithms so they perform their tasks efficiently and in analyzing how much time they 
take to run. The time for our algorithms should be polynomial in the size of our input, 
if possible; this size is the sum of the cardinalities of the sets in B. In particular, this 
requirement for polynomial time precludes explicit construction of T as the span of £>, 
as T may have exponentially greater size. 

We assume a standard random-access-machine model of computation in which sim- 
ple arithmetic steps and memory access operations may be performed in constant time. 
The input to our algorithms will be families of sets. We assume that each set element 
is represented as an object that takes a constant amount of computer storage and with 
which additional information may be associated. For instance, a natural representation 
with these properties would be to represent the n elements of a set family as integers 
in the range from to n — 1; we may then associate information with each element by 
using these integers as array indices. We represent an input set as a list of elements, 
and an input set family as a list of lists of elements. As is standard in the analysis of 
algorithms, we use O-notation to simplify the stated time bounds for our algorithms. 

15 Definition. In order to analyze and compare the running times of our algorithms, 
we need parameters to describe the input size. We define n to be the number of sets 
in B, £ to be the size of the largest set in B, and m to be the sum of cardinalities of 
sets in B. We say that an algorithm runs in polynomial time if its worst-case running 
time can be upper bounded by a polynomial function of £, m and n. For purposes of 
comparing run times it is convenient to note that £ < m < n£. 

16 Definition. The endpoints of a set X belonging to a base B are the elements of 
the set 

X\ |J Y. 

YeB,Ycx 

That is, the endpoints of X are the elements of X that are not contained in any proper 
subset of X that belongs to B. Equivalently, x is an endpoint of a set X in a base B 
if X is an atom at x. 

17 Lemma. There is an algorithm that takes as input a set family B and a set X e B, 
and that outputs the endpoints of X, using time 0{m). 

Proof. We associate with each element x in UB a Boolean variable that is true 
if and only if x is in X; setting up these variables takes time 0(m). By examining 
the value for x, we may test whether x belongs to X in constant time. For each set 
Y G B, we use these bits to determine whether Y C X, by testing each of the members 
of Y, in time 0(\ \JY\). By performing this test for all sets in B, we may determine a 
collection of the subsets of X that are in B, in total time 0(m). We then associate a 
second Boolean variable with each member of X; initially we set all of these variables 
to false. For each Y in our collection of subsets of X, we loop through the elements 
of Y, and set the Boolean variables associated with each of these elements to true. 
Finally, we loop through the elements of X, and form a list of the elements for which 
the associated Boolean value remains false. These elements are the endpoints of X. 
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The runtime of this algorithm is dominated by the steps in which we find the subsets 
of X and then use those subsets to mark covered elements of X; both of these steps 
take 0(m) total time. □ 

18 Theorem. Given a family B of sets, we may determine in time 0(nm) whether B 
is the base of a U-closed family T . 

Proof. We use Lemma [T71 to calculate the endpoints of each X G B. By definition, 
X is an atom if and only if it is empty or has a nonempty set of endpoints; thus, by 
Theorem [TT], B is the base of its span if and only if every set in B is either empty or 
has a nonempty set of endpoints. There are n sets, each of which takes time 0(m) to 
test, so the total time is 0(nm). □ 

19 Lemma. Suppose that set family B contains the empty set. Then B is the base of 
a U-closed well-graded family if and only if each nonempty X G B has one endpoint. 



This is closely related to some results of lKoppenl (119981 ) (see also lDoignon and Falmagne , 



19991 . Theorem 3.15, Condition (ii)). 

Proof. If some X G B has two or more endpoints x and y, then X belongs to both 
cr(x) and <j(y), so the surmise function a is not a partition of B. If some nonempty 
X has no endpoint, it is not an atom and not part of a base. Conversely if every 
nonempty X G B has one endpoint, then a partitions B according to those endpoints. 
The result follows from Theorem [131 □ 



20 Theorem. Given a family B of sets, we may determine in time 0{nm) whether B 
is the base of a learning space. 

Proof. We first check that B contains the empty set; if not, it cannot be the base 
of a learning space. Then, as in Theorem [181 we a Ppby Lemma [[7] to calculate the 
endpoints of each X G B. By Lemma [T9"| B is the base of a U-closed well-graded family 
if and only if each nonempty X G B has exactly one endpoint. There are n sets, each 
of which takes time 0(m) to test, so the total time is 0(nm). □ 

21 Definition. For any set family B and any set X G B, let B/X denote the family 
of sets {Y \X\Y G B}. 

22 Lemma. Let B be the base of a U-closed family T . Then T is well-graded if and 
only if, for each X in B, the family B/X spans a learning space. 

Proof. A tight path in T from X to some set Y D X corresponds (via set- 
theoretic difference of each path member with X) to a tight path in T jX from the 
empty set to Y \ X . Conversely, a tight path in T jX from the empty set to Y \ X 
corresponds (via set-theoretic union of each path member with X) to a tight path in 
T from XtoY. The result follows from Theorem [7J □ 

23 Theorem. Given a family B of sets, we may determine in time 0(n 2 m) whether 
B is the base of a U-closed well-graded family. 
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Proof. We may first test whether B is a base by Theorem [181 Next, for each 
X G B, we form the set Bx consisting of the empty set and the sets in B/X that have 
a nonempty set of endpoints with respect to B/X, and test whether Bx is the base of 
a learning space by Theorem [201 B itself is the base of a U-closed well-graded family 
if and only if each Bx passes this test, by Lemma [221 There are n sets Bx, each takes 
time 0(nm) to construct and test, and so the total time bound is 0(n 2 m). □ 

We now consider the situation in which B is not itself the base of a well graded 
family. Can we modify B to produce a well graded family that is as close as possible, 
in some sense, to the span of £>? 

24 Definition. A minimal well-graded extension of a family of sets B is a well-graded 
U-closed set family T such that B C J 7 , and such that no U-closed T' with B C T' C T 
is well-graded. A path family for a family of sets B is a set paths ttk,l with K and L 
in B; tik,l may use sets not belonging to the span of B, but is required to be a tight 
path in the power set of UB. We observe that the length of a path tik,l is at most 
the cardinality of L, and therefore that the total length of all paths in a path family is 
0{nm). A path extension J 7 of B is formed from a path family by letting B' consist of 
all the sets occurring on paths ~Kk,l and letting T be the span of B '. 

25 Lemma. Any path extension is well-graded. 

Proof. We show that, for every K and L in £>', where B' is the family of sets 
occurring on paths ttx,y in a path extension of B, that there exists a tight path in the 
span of B' from K to K U L. 

Thus, suppose K belongs to a path tta,b and L belongs to a path ttc,d- To form a 
tight path from K to K U L in J 7 , we concatenate the following three paths: 

1. a tight path from A to K along path tt^.b, 

2. a tight path from K to K U C, formed by the union of K with the sets in path 
tta,c, and 

3. a tight path from K U C to K U L, formed by the union of U C with the sets 
in the portion of path ttc,d that extends from C to L. 

When this concatenation would cause the same set to appear repeatedly, we discard 
the duplicate sets. It is straightforward to verify that each set in this concatenation of 
paths belongs to the span of B' . Thus, we can form a tight path from any K to K U L 
in this span, and therefore, by Theorem [3, the span is well-graded. □ 

26 Lemma. Any minimal well-graded extension is a path extension. 

Proof. Let T be a minimal well-graded extension of B. Then by Theorem [7] we 
can find a path family for B, such that each set occurring in each path belongs to T . 
By Lemma [251 the corresponding path extension is well-graded, and it is a subfamily 
of T and contains every set in B. By the minimality of J 7 , this path extension must 
coincide with T. □ 
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It is straightforward to combine the results above in an algorithm that finds a 
minimal well-graded extension of any set family in polynomial time: construct a path 
family arbitrarily, and then for each set in its base, determine whether the set can be 
removed by testing the result of the removal for well-gradedness, using Theorem [23] to 
do these tests. However the polynomial time bound of this algorithm would be large. 
We now describe a more efficient algorithm for the same task, based on a more careful 
choice of path family. 

27 Theorem. Given any family of sets B, we can find a minimal well-graded extension 
of B in time 0(nm£ + n 3 m). 

Proof. We simultaneously form the paths itk,l in a path family, and a superset B' 
of B that includes a base for our eventual well-graded extension, by adding sets in order 
by the cardinality of the sets. At step i of the process, we include sets of cardinality i 
into £>', taking care as we do that all the sets we add are necessary for well-gradedness. 
As we do so, we maintain the following data: 

• Sk,l is the union of all sets X G B' such that X G K U L and X U K ^ K U L. 

• ca,b,c,d is a Boolean value, true if and only if Sa,b CCUD. 

In the ith step of the algorithm, we consider the set Ilj of all paths ttk,l such that 
\Sk,l\ = i — I and such that \K U L\ > i. We will add sets of cardinality i to B', of 
the form Sk,l U {x} for some x in (K UL)\ Sk,l, in order to allow one more step on 
each path. However, note that if Sa,b — Sc,d then a single set of this type may allow 
an additional step for multiple paths. 

We observe that, if tta,b an d ttc,d are both in nj, then ca,b,c,d is true if and only 
if Sa,b = Sc,d- Thus the relation c can be viewed as an equivalence relation on the 
paths in Hi. As part of our calculation in step i of the algorithm, we construct the 
equivalence classes of this equivalence relation. 

For each equivalence class, we form a bipartite graph (U,V,E). Here U consists 
of pairs (K, L) corresponding to paths itk,l m the equivalence class. V consists of 
elements x in the sets (K UL)\ Sk,l, for paths itk,l in the equivalence class. We draw 
an edge from (K, L) to x if x G (K U L) \ Sk,l- We find a minimal subset of V that 
dominates every vertex of U in this graph; this gives us a minimal family of sets that 
we can add to B' in order to take another step on each path in the equivalence class. 
This minimal dominating set can be found by repeatedly either including in it a vertex 
in V that is the only neighbor of some vertex in U or, if no such vertex in U exists, 
removing from the graph an arbitrarily chosen vertex in V; the total time to perform 
this step is proportional to the size of the graph. 

Once we have found these sets to add to £>', we must update the data we are 
maintaining so that we may repeat this computation for a larger value of i. Whenever 
we add a set corresponding to a path tta,b m and element x, we examine all paths 
7Tc,d for which ca,b,c,d is true. If x G CUD, we include x as a new member of Sc,d- (In 
particular, ca,b,a,b will always be true, and we will always include x as a new member 
of Sa,b-) However, if x ^ C U D, we instead set ca,b,c,d to false. 

We now analyze the running time of this algorithm: 
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• We may compute the initial value of each set Sk,l i n time 0(m) , simply by testing 
each other set X in time 0(|X|), after an initial + \L\) time preprocessing 
stage to construct data structures for testing membership in K U L. and L\K. 
Thus, we may construct all such sets in time 0(n 2 m). 

• We may compute the initial value of ca,b,c,d in time 0(\A\ + \B\ + \C\ + \D\). 
Adding this up over all quadruples A, B, C, D produces a runtime of 0(n 3 m). 

• Identifying IT takes time 0(n 2 ). There are 0(m) steps of the algorithm, so the 
total time for this identification is 0(n 2 m). 

• The sum of the cardinalities of the sets Ilj, summed over all i, is 0(nm), because 
each time we include a set in Ilj we take a step on the corresponding path, and 
the total length of all paths in a path family is 0(nm). We may identify the 
equivalence class of a single path in time 0(n 2 ); therefore, the total time to 
construct equivalence classes, throughout the course of the algorithm, is 0(n 3 m). 

• Each vertex in U in the bipartite graph constructed for an equivalence class may 
have 0(£) neighbors. Therefore, the total size of all bipartite graphs so con- 
structed, and the total time to find dominating sets in these graphs, is 0(nm£). 

• Each set added to B' can be constructed explicitly, as a list of elements, from 
the data structures we already have, in time 0(£). Thus, the total time to list 
all these sets is 0(nm£). In addition, for each such set, we spend 0(n 2 ) time 
examining the paths for which ca,b,c,d is true, the total for which over the course 
of the algorithm is 0(n 3 m). 

Thus, the total time for all of these steps is 0(nm£ + n 3 m). □ 

It may be seen as a flaw in the completion algorithm described above that not 
every set in the input family B is necessarily part of a base of the output family T it 
produces. Given B, can we find a wg-family T such that every set in B is part of the 
base of JF? Unfortunately, as we now show, this problem appears to be intractable. 

28 Theorem. It is NP-complete, given a set family B, to determine whether there 
exists a well-graded U-closed set family T such that B is a subset of the base of T . 

Proof. If an T satisfying this requirement exists, we can choose T to be minimal 
and therefore, by Lemma [261 a path extension. Thus, we can test in NP whether T 
exists by nondeterministically choosing a path extension, applying Lemma [17] to find 
the endpoints of all sets included on paths in the extension, and verifying that each 
member of B has an endpoint. Therefore, determining whether JF exists belongs to 
NP, the easier part of proving that it is NP-complete. 

To finish the NP-completeness proof, we reduce th e prob lem from a known NP- 



complete problem, 3-satisfiability (jGarey and Johnson! . Il979l ). A 3-satisfiability in 



stance consists of sets of variables V, complements of variables V = {v \ v G V}, and 
a set C of clauses, where each clause is a set of three terms, and where a term is any 
element of V U V. A truth assignment is any function / from V to {0, 1}; we may 
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extend / to the domain V U V by f(v) = 1 — f(v). A truth assignment is satisfying 
if each clause of C contains at least one term mapped by / to 1, and a 3-satisfiability 
instance is satisfiable if and only if it has a satisfying assignment. 

From a 3-satisfiability instance (V, V, C) we form a set family B, the ground set 
of which will be the terms and clauses of the instance: UB = V U V U C. For each 
variable v G V, we include in B the set {v,v}, and for each clause c corresponding to 
the conjunction of three terms u, v, and w we include in B two sets, {c} and {c, u, v, w}. 
Additionally, we include in B the empty set. As we now show, the resulting set B forms 
a subset of the base of a well-graded U-closed set family J 7 , if and only if the given 
3-satisfiability instance is satisfiable. 

In one direction, suppose we have a satisfying truth assignment /. Let H = 
{{x} \x G V U V and f(x) = 0}. Let ti(c), for i G {0, 1, 2} map clauses to terms in 
such a way that c = {t (c), ti(c), t 2 (c)} and /(^(c)) = 1. Let T = {{c, t (c)} |c G C} 
and Ti = {{c, t (c), ^(c)} | c G C}. We let T be the span of H U B U T U T x . For 
each set {v, v} £ B in there is a tight path in T from the empty set through {v} or 
{v} respectively as v or v is mapped by / to 0; the element of {v, v} not mapped to 
is an endpoint of {v,v}. For each set {c, to(c), ti(c), t 2 (c)} G B there is a tight path 
in T from the empty set through sets {c}, {c, t (c)}, {c, t (c), ti(c)}, and t 2 (c) is an 
endpoint. The paths through {c}, {c, to( c )}, { c , ^o(c), ti(c)} also form tight paths in T 
to each set in To and T%. Thus, every set in Ti U B U To U T\ has a tight path in T from 
the empty set, so by Theorem [7] T is well-graded, and every nonempty set in B has an 
endpoint, so by Lemma [T9l B is part of the base of T . Thus, we have shown that, if / 
is a satisfying truth assignment, B forms a subset of the base of a well-graded U-closed 
family. 

In the other direction, suppose that there exists a set family T that is a minimal 
path extension of B for which B is a subset of the base. Then, in order to have a 
tight path from the empty set to {v, v}, while not eliminating that set from the base, 
T must contain exactly one of the two sets {v}, {v}; form a truth assignment / in 
which we assign v the value 1 if {v} is in T and the value if {v } is in T . This must 
be a satisfying assignment, for if a clause c had no variable satisfying it then the set 
{c,u,v,w} would have no endpoints and therefore couldn't be part of the base of T. 
Thus, we have shown that, if B forms a subset of the base of a well-graded U-closed 
family, then the 3-satisfiability instance (V, V, C) has a satisfying assignment. 

We have described a polynomial time many-one reduction from the known NP- 
complete problem of 3-satisfiability to the problem of testing whether a set family is a 
subset of a base of a well-graded U-closed family, and we have shown that the latter 
problem is in NP. Therefore, it is NP-complete. □ 

29 Remark. Since the family B formed by this reduction contains the empty set, the 
same reduction shows that it is also NP-complete to determine whether a given set B 
is a subset of the base of a learning space. 

30 Remark. It is natural to desire, not just a minimal well-graded extension, but an 
extension that is minimum, either in the sense of having the smallest cardinality as a 
well-graded set family, in the sense of having the smallest cardinality base, in the sense 
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of having the smallest number of additional sets added to the input family B, or in the 
sense of minimizing the sum of cardinalities of additional sets or of the base. We expect 
that these problems are computationally intractable, but do not have a hardness proof 
for them. The problems of minimizing base size at least belong to NP, but minimizing 
the cardinality of T may not since it involves counting the members of a family of sets 
that may be exponentially larger than the input. 
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